Skip to content

[core] Support partition predicate pushdown for PartitionsTable#7628

Open
sundapeng wants to merge 4 commits intoapache:masterfrom
sundapeng:feat/partitions-table-predicate-pushdown
Open

[core] Support partition predicate pushdown for PartitionsTable#7628
sundapeng wants to merge 4 commits intoapache:masterfrom
sundapeng:feat/partitions-table-predicate-pushdown

Conversation

@sundapeng
Copy link
Copy Markdown
Member

Purpose

PartitionsTable.withFilter() was a no-op (// TODO at line 189), causing full manifest scans when querying with partition filters like SELECT * FROM t$partitions WHERE partition = 'dt=20260410'. This adds partition predicate pushdown following the same pattern established by BucketsTable (#7592), FilesTable (#7376), ManifestsTable (#7310), and ConsumersTable (#7329).

The implementation uses a dual-path filtering strategy:

  • Catalog path: preserves catalog.listPartitions() call and filters results in memory, keeping metadata columns (created_at, created_by, updated_by, options) intact
  • TableScan fallback path: pushes predicate down to InnerTableScan.withPartitionFilter() for manifest-level pruning

Also refactors PartitionPredicateHelper.applyPartitionFilter() into a two-step build+apply pattern (buildPartitionPredicate() + apply), and extends parsePartitionSpec() to support PartitionsTable's key=value/key=value format in addition to the existing {value1, value2} format.

Tests

  • Equal filter on single partition key
  • IN filter on multiple partition values
  • No-match filter returns empty result
  • Non-partition column filter safely ignored
  • Multi-column partition keys with Equal and IN filters
  • Existing BucketsTableTest passes (verifies refactored helper is backward-compatible)

API and Format

No API or storage format changes.

Documentation

No documentation changes needed.

sundapeng and others added 3 commits April 11, 2026 08:30
PartitionsTable.withFilter() was a no-op (TODO), causing full manifest
scans when querying with partition filters. This adds predicate pushdown
following the same pattern as BucketsTable (apache#7592) and FilesTable (apache#7376).

Key changes:
- PartitionsScan extracts partition predicate via LeafPredicateExtractor
- PartitionsSplit carries the predicate to PartitionsRead
- Catalog path: in-memory filter preserving metadata columns
- TableScan path: manifest-level pushdown via withPartitionFilter
- PartitionPredicateHelper refactored to build+apply two-step pattern
- parsePartitionSpec extended for key=value/key=value format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…redicate

filterByPredicate used raw p.spec().get(key) which renders null as literal
"null", while toRow substitutes null with defaultPartitionName. This caused
predicate pushdown to fail matching null-valued partitions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
testPartitionPredicateFilterMultiColumnKeys created MultiPartTable directly
via filesystem (SchemaUtils.forceCommit), which works for local catalog but
fails for REST catalog since it's unaware of tables created outside its API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Object value =
TypeUtils.castFromString(
partSpec.get(partitionKeys.get(i)), partitionType.getTypeAt(i));
predicates.add(partBuilder.equal(i, value));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider default partitions here. If the value is the default partition name, this should become isNull(...) rather than equal(...).

…correctness

- Handle __DEFAULT_PARTITION__ in buildPartitionPredicate() by generating
  isNull() instead of castFromString() which throws NumberFormatException
  on non-string partition types (e.g. INT)
- Fix scan path to skip pushdown for unsupported predicates instead of
  returning empty results
- Pass defaultPartitionName through all system table callers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sundapeng
Copy link
Copy Markdown
Member Author

E2E Benchmark v2: 3 Rounds × 9 Samples (with bug fixes)

Environment

  • ECS EMR cn-hongkong (16 vCPU, 32GB), DLF REST Catalog, Spark 3.5.3, Java 11
  • 4 tables: 100 / 1,000 / 5,000 / 10,000 partitions
  • 3 full rounds per scenario, 3 internal runs per query = 9 samples per data point

Bug Fixes in Latest Push

  1. buildPartitionPredicate() handles __DEFAULT_PARTITION__isNull() (fixes NumberFormatException on INT partitions)
  2. Unsupported predicates skip pushdown instead of returning empty results
  3. defaultPartitionName passed through all system table callers

Catalog Path (DLF REST API)

Partitions Query Before (ms) After (ms) Speedup
100 no_filter 150 141 1.06x
100 equal_filter 180 171 1.05x
100 in_filter 132 126 1.05x
1,000 no_filter 152 157 0.97x
1,000 equal_filter 127 112 1.14x
1,000 in_filter 121 103 1.18x
5,000 no_filter 251 300 0.84x
5,000 equal_filter 247 229 1.08x
5,000 in_filter 240 221 1.09x
10,000 no_filter 410 404 1.02x
10,000 equal_filter 402 374 1.08x
10,000 in_filter 401 360 1.11x

Filtered avg at 10K: +8.6%

Scan Path (Time-Travel VERSION AS OF 1)

Partitions Query Before (ms) After (ms) Speedup
100 no_filter 223 223 1.00x
100 equal_filter 242 239 1.01x
100 in_filter 207 222 0.93x
1,000 no_filter 206 208 0.99x
1,000 equal_filter 167 156 1.07x
1,000 in_filter 168 161 1.04x
5,000 no_filter 235 256 0.92x
5,000 equal_filter 179 154 1.16x
5,000 in_filter 193 152 1.27x
10,000 no_filter 240 234 1.02x
10,000 equal_filter 199 142 1.40x
10,000 in_filter 201 156 1.29x

Filtered avg at 10K: +25.5%

Summary

Path Best Single Speedup 10K Filtered Avg
Catalog (DLF) 1.18x (1K in_filter) +8.6%
Scan (time-travel) 1.40x (10K equal_filter) +25.5%

No regression in unfiltered baselines. Improvement scales with partition count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants